522 research outputs found
Space-Time Complexity in Hamiltonian Dynamics
New notions of the complexity function C(epsilon;t,s) and entropy function
S(epsilon;t,s) are introduced to describe systems with nonzero or zero Lyapunov
exponents or systems that exhibit strong intermittent behavior with
``flights'', trappings, weak mixing, etc. The important part of the new notions
is the first appearance of epsilon-separation of initially close trajectories.
The complexity function is similar to the propagator p(t0,x0;t,x) with a
replacement of x by the natural lengths s of trajectories, and its introduction
does not assume of the space-time independence in the process of evolution of
the system. A special stress is done on the choice of variables and the
replacement t by eta=ln(t), s by xi=ln(s) makes it possible to consider
time-algebraic and space-algebraic complexity and some mixed cases. It is shown
that for typical cases the entropy function S(epsilon;xi,eta) possesses
invariants (alpha,beta) that describe the fractal dimensions of the space-time
structures of trajectories. The invariants (alpha,beta) can be linked to the
transport properties of the system, from one side, and to the Riemann
invariants for simple waves, from the other side. This analog provides a new
meaning for the transport exponent mu that can be considered as the speed of a
Riemann wave in the log-phase space of the log-space-time variables. Some other
applications of new notions are considered and numerical examples are
presented.Comment: 27 pages, 6 figure
A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome
The high degree of polymorphism in the genome of the sea squirt Ciona savignyi complicated the assembly of sequence contigs, but a new alignment method results in a much improved sequence
VARiD: A variation detection framework for color-space and letter-space platforms
Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data
<p>Abstract</p> <p>Background</p> <p>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.</p> <p>Results</p> <p>We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.</p> <p>Conclusions</p> <p>MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p
Recurrence and algorithmic information
In this paper we initiate a somewhat detailed investigation of the
relationships between quantitative recurrence indicators and algorithmic
complexity of orbits in weakly chaotic dynamical systems. We mainly focus on
examples.Comment: 26 pages, no figure
M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species
BACKGROUND: Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. RESULTS: To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. CONCLUSION: M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at:
Savant Genome Browser 2: visualization and analysis for population-scale genomics
High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.co
Complexity for extended dynamical systems
We consider dynamical systems for which the spatial extension plays an
important role. For these systems, the notions of attractor, epsilon-entropy
and topological entropy per unit time and volume have been introduced
previously. In this paper we use the notion of Kolmogorov complexity to
introduce, for extended dynamical systems, a notion of complexity per unit time
and volume which plays the same role as the metric entropy for classical
dynamical systems. We introduce this notion as an almost sure limit on orbits
of the system. Moreover we prove a kind of variational principle for this
complexity.Comment: 29 page
- …